i-Code: An Integrative and Composable Multimodal Learning Framework

نویسندگان

چکیده

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited one or two modalities. We present i-Code, self-supervised framework where users may flexibly combine the modalities of vision, speech, language into unified general-purpose vector representations. In this framework, data from each modality first given pretrained single-modality encoders. The encoder outputs then integrated with multimodal fusion network, which uses novel merge- co-attention mechanisms effectively information different entire system end-to-end new objectives including masked unit modeling cross-modality contrastive learning. Unlike previous research using only video for pretraining, i-Code can dynamically process single, dual, triple-modality during training inference, projecting combinations single representation space. Experimental results demonstrate how outperform state-of-the-art techniques on five understanding tasks benchmarks, improving by as much 11% demonstrating power integrative pretraining.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nursing leadership competency learning- an integrative review

Background: In the last decade literature, inquiries and reports into the short comings in health services have highlighted the vital role of leadership in clinical practice and the impact on patient care and effective workplace culture. Given the important role of nurses as the largest therapeutic group in health systems, the question is how nurses acquire clinical leadership ...

متن کامل

Resource complementarity and type of interorganizational learning: an integrative framework

Purpose – Resource-and knowledge-based scholars claim that firms should focus on the creation and accumulation of knowledge-based competencies in order to yield long-term survival. Several authors have emphasized the added value of alliance relationships in the knowledge development and learning processes of organizations. The knowledge-based view of interfirm alliance has recently drawn increa...

متن کامل

Organizational learning and capabilities : An integrative conceptual framework

Organizational learning (Bontis, Crossan and Hulland, 2002) and capabilities (Barney, 1991) have been argued to increase performance. Recently, connections have been established between organizational learning and capabilities. On the one hand, learning has been considered as a capability (Hult and Ketchen, 2001; Goh, 2003; Henri, 2006), leading to the idea of “learning capability”. On the othe...

متن کامل

A Multimodal Interaction Framework for Blended Learning

Humans interact with each other by utilizing the five basic senses as input modalities, whereas sounds, gestures, facial expressions etc. are utilized as output modalities. Multimodal interaction is also used between humans and their surrounding environment, although enhanced with further senses such as equilibrioception and the sense of balance. Computer interfaces that are considered as a dif...

متن کامل

A Composable Reflective Communication Framework

A communication service is described by an abstract protocol that specifies a set of roles to be played by participants, the requirements on role players and installation information. The (dynamic) installation of a protocol requires no knowledge or modification of the component itself; it is sufficient to encapsulate each component in a layer that implements the role it is to play, affecting o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i9.26290